Skip to content

Fix bugs in cleanup_unsubmitted_forms scheduled job#21

Open
Ochiengsteven wants to merge 8 commits into
vula-africa:mainfrom
Ochiengsteven:fix/cleanup-unsubmitted-forms-bugs
Open

Fix bugs in cleanup_unsubmitted_forms scheduled job#21
Ochiengsteven wants to merge 8 commits into
vula-africa:mainfrom
Ochiengsteven:fix/cleanup-unsubmitted-forms-bugs

Conversation

@Ochiengsteven
Copy link
Copy Markdown

@Ochiengsteven Ochiengsteven commented Jan 24, 2026

Summary

  • Fix date calculation bug that was missing milliseconds conversion
  • Fix query logic to properly find forms older than 7 days (not just exactly 7 days old)
  • Improve deletion order in transaction to respect foreign key constraints
  • Handle edge case where token has no associated relationship
  • Batched processing: Handles millions of records without memory issues

Summary by CodeRabbit

  • Refactor

    • Cleanup now runs in batched, transaction-based operations with retention-based cutoff, deduplication, and aggregated totals for efficient, large-scale deletions.
  • Bug Fixes

    • Added early-exit when no expired items remain, ensured job status updates, improved error handling and logging.
  • Documentation

    • Added detailed guidance covering the cleanup approach, edge-case handling, performance trade-offs, safety considerations, and an overview video link.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 24, 2026

Walkthrough

Replaces per-token iterative cleanup with batched processing using configurable BATCH_SIZE and RETENTION_DAYS; fetches expired tokens in loops, deduplicates IDs, performs atomic per-batch transactional deletions of relationships, corpus items, entities, and tokens, tracks totals, logs a final summary, and preserves status updates and error handling.

Changes

Cohort / File(s) Summary
Unsubmitted forms cleanup test
tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts
Introduces BATCH_SIZE and RETENTION_DAYS; replaces single-item loop with batched take(BATCH_SIZE) fetch of expired tokens; deduplicates token/entity/product IDs; gathers related IDs; performs a single atomic transaction per batch deleting relationships, corpus items, entities, and tokens in correct FK order; accumulates deletion totals; logs final summary; retains status update to completed and marks failed on error.
Documentation
tests/unsubmitted_forms/README.md
Adds README describing issues fixed, batched transactional solution, trade-offs (batch tuning, memory vs atomicity), safety notes, and a supporting video link.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I hopped through old tokens, one batch at a time,
Gathered IDs in neat little lines.
A single transaction — a tidy sweep,
Totals counted, logs to keep.
🥕✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change—fixing bugs in the cleanup_unsubmitted_forms scheduled job, which is the primary focus of the changeset including date calculation, query logic, deletion order, and edge case handling.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Ochiengsteven Ochiengsteven changed the title Refactor cleanup_unsubmitted_forms to conditionally delete entity-related data based on existence of entityId, improving transaction handling. Fix bugs in cleanup_unsubmitted_forms scheduled job Jan 24, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts (1)

44-49: Relationship query lacks direct token link, risking data integrity issues.

The relationship query filters by product_id and status: "new" without directly linking to the current token. If multiple tokens share the same productId, the findFirst() will return an arbitrary relationship, and deleting it could remove data required by other unexpired tokens.

The query should include an additional filter to ensure the relationship is specifically tied to the current token (e.g., token_id: token.id or via the token's entityId if the schema supports it) to guarantee only the correct relationship is deleted.

🧹 Nitpick comments (2)
tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts (2)

51-57: Improve type safety for transaction array.

Using any[] loses type safety. Consider using the proper Prisma type for transaction operations.

♻️ Suggested improvement
-      const deleteOperations: any[] = [
+      const deleteOperations: Prisma.PrismaPromise<unknown>[] = [

You'll need to import Prisma from @prisma/client:

-import type { JobScheduleQueue } from "@prisma/client";
+import type { JobScheduleQueue, Prisma } from "@prisma/client";

43-89: Consider resilient error handling for partial failures.

If a transaction fails for one token (e.g., due to a constraint violation or transient error), the job immediately fails and remaining tokens are not processed. This could leave the cleanup incomplete and require manual intervention.

Consider wrapping individual token processing in a try-catch to continue with remaining tokens, logging failures, and summarizing at the end:

♻️ Suggested approach
+    const failures: Array<{ token: string; error: unknown }> = [];
+
     for (const token of expiredTokens) {
+      try {
         const relationship = await prisma.relationship.findFirst({
           // ... existing code ...
         });

         // ... build deleteOperations ...

         await prisma.$transaction(deleteOperations);
+      } catch (tokenError) {
+        console.error(`Failed to cleanup token ${token.token}:`, tokenError);
+        failures.push({ token: token.token, error: tokenError });
+      }
     }

-    await update_job_status(job.id, "completed");
+    if (failures.length > 0) {
+      console.error(`Cleanup completed with ${failures.length} failures`);
+      await update_job_status(job.id, "failed");
+    } else {
+      await update_job_status(job.id, "completed");
+    }
   } catch (error) {
     console.error("Error cleaning up unsubmitted forms:", error);
     await update_job_status(job.id, "failed");
     throw error;
   }

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts`:
- Around line 34-39: The token query (prisma.publicFormsTokens.findMany fetching
expiredTokens by createdAt) lacks a submission-status check and may remove
tokens for already-submitted forms; update the where clause on
prisma.publicFormsTokens.findMany to include the same filter used when deleting
relationships (e.g., status: "new" or ensure submissionTimestamp is null) so
only unsubmitted tokens are selected, or alternatively verify and document that
tokens are always removed on submission if you choose to keep the age-only
filter.
🧹 Nitpick comments (1)
tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts (1)

47-55: De-duplicate ID lists before IN filters.

This reduces query size and avoids unnecessary parameter bloat for large batches.

♻️ Suggested diff
-    const tokenStrings = expiredTokens.map((t) => t.token);
-    const entityIds = expiredTokens
-      .map((t) => t.entityId)
-      .filter((id): id is string => id !== null && id !== undefined);
-    const productIds = expiredTokens
-      .map((t) => t.productId)
-      .filter((id): id is string => id !== null && id !== undefined);
+    const tokenStrings = [...new Set(expiredTokens.map((t) => t.token))];
+    const entityIds = [
+      ...new Set(
+        expiredTokens
+          .map((t) => t.entityId)
+          .filter((id): id is string => id !== null && id !== undefined)
+      ),
+    ];
+    const productIds = [
+      ...new Set(
+        expiredTokens
+          .map((t) => t.productId)
+          .filter((id): id is string => id !== null && id !== undefined)
+      ),
+    ];

Comment thread tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts Outdated
…submitted tokens older than 7 days, and optimize ID collection for batch deletion by ensuring uniqueness, improving performance and clarity.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts`:
- Around line 76-82: The entity and new_corpus deletions lack the same
"new"/unsubmitted filter used earlier and can race with a submit; inside the
cleanup transaction (in tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts)
update the prisma.new_corpus.deleteMany and prisma.entity.deleteMany calls to
also filter for unsubmitted items (e.g., include status: "new" or submittedAt:
null matching the earlier query) using the same entityIds set, or alternatively
re-query/check submittedAt for those entityIds inside the same transaction (or
obtain a DB-level lock) to guarantee only truly unsubmitted records are deleted.
🧹 Nitpick comments (1)
tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts (1)

67-87: Consider adding guards for empty arrays to avoid unnecessary queries.

When all tokens have null entityId or productId, the corresponding arrays will be empty. While Prisma's { in: [] } safely matches nothing, executing these queries adds unnecessary database round-trips.

♻️ Proposed optimization to skip queries for empty arrays
     // batch delete in a single transaction - order matters for FK constraints
-    await prisma.$transaction([
-      // delete relationships tied to these products that are still "new" (unsubmitted)
-      prisma.relationship.deleteMany({
+    const operations = [];
+
+    // delete relationships tied to these products that are still "new" (unsubmitted)
+    if (productIds.length > 0) {
+      operations.push(prisma.relationship.deleteMany({
         where: {
           product_id: { in: productIds },
           status: "new",
         },
-      }),
-      // delete corpus items before entities (they reference entity_id)
-      prisma.new_corpus.deleteMany({
-        where: { entity_id: { in: entityIds } },
-      }),
-      // delete the entities
-      prisma.entity.deleteMany({
-        where: { id: { in: entityIds } },
-      }),
-      // finally delete the tokens
-      prisma.publicFormsTokens.deleteMany({
-        where: { token: { in: tokenStrings } },
-      }),
-    ]);
+      }));
+    }
+
+    // delete corpus items before entities (they reference entity_id)
+    if (entityIds.length > 0) {
+      operations.push(prisma.new_corpus.deleteMany({
+        where: { entity_id: { in: entityIds } },
+      }));
+      operations.push(prisma.entity.deleteMany({
+        where: { id: { in: entityIds } },
+      }));
+    }
+
+    // always delete the tokens
+    operations.push(prisma.publicFormsTokens.deleteMany({
+      where: { token: { in: tokenStrings } },
+    }));
+
+    await prisma.$transaction(operations);

Comment on lines +76 to +82
prisma.new_corpus.deleteMany({
where: { entity_id: { in: entityIds } },
}),
// delete the entities
prisma.entity.deleteMany({
where: { id: { in: entityIds } },
}),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor race condition risk: entity deletion lacks submission status protection.

The relationship deletion correctly filters by status: "new", but the entity and corpus deletions don't have equivalent protection. If a form is submitted between the initial query and the transaction, the relationship would be preserved (due to status changing from "new"), but the entity and corpus items would still be deleted.

This is a narrow window and may be acceptable, but if strict consistency is required, consider re-checking submittedAt inside the transaction or using a database-level lock.

🤖 Prompt for AI Agents
In `@tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts` around lines 76 - 82,
The entity and new_corpus deletions lack the same "new"/unsubmitted filter used
earlier and can race with a submit; inside the cleanup transaction (in
tests/unsubmitted_forms/cleanup_unsubmitted_forms.ts) update the
prisma.new_corpus.deleteMany and prisma.entity.deleteMany calls to also filter
for unsubmitted items (e.g., include status: "new" or submittedAt: null matching
the earlier query) using the same entityIds set, or alternatively re-query/check
submittedAt for those entityIds inside the same transaction (or obtain a
DB-level lock) to guarantee only truly unsubmitted records are deleted.

…improved memory management and performance, ensuring safe deletion of relationships tied to expired entities while maintaining atomic transactions for each batch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant